Skip to content

[fix](fe) Prevent pushing other join conditions to right child of null-aware anti join when hash conjuncts are empty#64898

Open
starocean999 wants to merge 2 commits into
apache:masterfrom
starocean999:master_0626
Open

[fix](fe) Prevent pushing other join conditions to right child of null-aware anti join when hash conjuncts are empty#64898
starocean999 wants to merge 2 commits into
apache:masterfrom
starocean999:master_0626

Conversation

@starocean999

@starocean999 starocean999 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

When PushDownJoinOtherCondition processes a NULL_AWARE_LEFT_ANTI_JOIN
with empty hash join conjuncts, it incorrectly pushes the other join conditions
that reference only the right-side slots down to the right child. This changes
the evaluation semantics of the NOT IN subquery, because filtering the right
child before the null-aware anti join eliminates NULL values prematurely.

For a NULL_AWARE_LEFT_ANTI_JOIN, the hash join conjuncts represent the
correlation between the outer and inner tables. When there is no correlation
(i.e., hash conjuncts are empty), the other join conditions must not be pushed
to the right child, as doing so would filter rows from the subquery result
before the NULL check, potentially producing incorrect results.

Example affected query:
SELECT COUNT(*) FROM (SELECT 1 AS x) t
WHERE 1 NOT IN (SELECT CAST(NULL AS INT));

The subquery returns NULL, so NOT IN should evaluate to UNKNOWN (NULL),
meaning no rows match and COUNT returns 0. However, pushing the other condition
to the right child changes this behavior and produces wrong results.

Fix:
Add a guard in PushDownJoinOtherCondition to keep the other condition as
a remaining other conjunct (not pushed to the right child) when the join type
is NULL_AWARE_LEFT_ANTI_JOIN and hash join conjuncts are empty.

Related PR: #16774

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@starocean999

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary for PR #64898:

I reviewed the three changed files and the surrounding Nereids null-aware anti join rewrite/translation path. I did not find a substantiated blocking issue, so I am leaving this as a summary-only review with no inline comments.

Critical checkpoint conclusions:

  • Goal and proof: the PR prevents PushDownJoinOtherCondition from pushing a right-child-only predicate below a NULL_AWARE_LEFT_ANTI_JOIN when hashJoinConjuncts is empty. The added qt_select65 covers the constant-vs-null subquery case and expects COUNT(*) = 0.
  • Scope: the code change is small and focused on the affected pushdown rule, with a matching regression output update.
  • Parallel paths: checked InApplyToJoin, FindHashConditionForJoin, JoinUtils.JoinSlotCoverageChecker, repeated RuleSet.PUSH_DOWN_FILTERS use, and physical hash/nested-loop translation. I did not find another path that would still push the same no-hash NAAJ right-only comparison into the build child.
  • Session/config behavior: no new configuration item is added. The regression suite does not set planner variables, but SessionVariable defaults keep Nereids enabled and fallback disabled, so the new case should exercise the Nereids rule path under default regression settings.
  • Concurrency, lifecycle, persistence, data writes, FE-BE protocol compatibility, and observability: not materially involved by this PR.
  • Test/output review: the new query returns a single aggregate row, so ordering is deterministic, and the .out label/result matches the suite case.
  • Validation limit: I did not run FE build or the regression suite locally because this runner lacks thirdparty/installed and thirdparty/installed/bin/protoc. I did run git diff --check on the three PR files, which passed.

User focus: no additional user-provided review focus was present.

Subagent conclusions:

  • optimizer-rewrite reported NO_NEW_VALUABLE_FINDINGS; no optimizer candidate became an inline comment.
  • tests-session-config reported NO_NEW_VALUABLE_FINDINGS; no test/session candidate became an inline comment.
  • Convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the same current ledger and empty proposed final inline comment set.

@starocean999

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29036 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 09ffce185d033d49869f6519fe3e8c23c81f061e, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17610	3931	4128	3931
q2	1988	307	193	193
q3	10317	1407	810	810
q4	4689	464	334	334
q5	7492	865	581	581
q6	182	172	140	140
q7	761	823	640	640
q8	9359	1691	1621	1621
q9	6301	4495	4516	4495
q10	6804	1789	1535	1535
q11	436	274	249	249
q12	644	420	291	291
q13	18131	3310	2694	2694
q14	264	259	244	244
q15	q16	790	765	710	710
q17	926	883	926	883
q18	6882	5807	5623	5623
q19	1481	1270	1104	1104
q20	507	396	284	284
q21	5768	2635	2375	2375
q22	420	350	299	299
Total cold run time: 101752 ms
Total hot run time: 29036 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4304	4240	4271	4240
q2	321	347	225	225
q3	4589	5000	4416	4416
q4	2047	2157	1368	1368
q5	4422	4309	4297	4297
q6	227	173	133	133
q7	1706	1697	1907	1697
q8	2527	2091	2132	2091
q9	8085	8111	8181	8111
q10	4785	4755	4254	4254
q11	574	417	398	398
q12	748	745	532	532
q13	3421	3604	2967	2967
q14	322	311	292	292
q15	q16	734	722	686	686
q17	1366	1313	1317	1313
q18	7944	7331	6985	6985
q19	1125	1082	1116	1082
q20	2217	2248	1979	1979
q21	5232	4560	4386	4386
q22	510	470	412	412
Total cold run time: 57206 ms
Total hot run time: 51864 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 171437 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 09ffce185d033d49869f6519fe3e8c23c81f061e, data reload: false

query5	4318	641	479	479
query6	443	193	166	166
query7	4806	565	304	304
query8	329	180	165	165
query9	8779	4044	4005	4005
query10	433	300	284	284
query11	5922	2373	2129	2129
query12	164	102	99	99
query13	1254	624	429	429
query14	6266	5270	4951	4951
query14_1	4261	4253	4280	4253
query15	223	202	184	184
query16	982	451	460	451
query17	1145	761	581	581
query18	2476	473	346	346
query19	207	194	147	147
query20	113	110	107	107
query21	222	140	126	126
query22	13644	13560	13453	13453
query23	17381	16469	16284	16284
query23_1	16542	16296	16209	16209
query24	7469	1774	1300	1300
query24_1	1330	1290	1311	1290
query25	583	460	397	397
query26	1307	329	170	170
query27	2655	589	359	359
query28	4438	2028	2022	2022
query29	1065	613	495	495
query30	310	239	200	200
query31	1125	1079	955	955
query32	109	64	60	60
query33	514	323	259	259
query34	1192	1154	680	680
query35	753	790	680	680
query36	1427	1422	1251	1251
query37	156	110	94	94
query38	1879	1712	1668	1668
query39	917	919	898	898
query39_1	909	935	890	890
query40	216	117	97	97
query41	64	72	61	61
query42	94	84	88	84
query43	319	322	276	276
query44	1468	790	795	790
query45	207	190	179	179
query46	1103	1241	732	732
query47	2350	2364	2238	2238
query48	369	428	316	316
query49	583	417	304	304
query50	968	357	263	263
query51	4445	4398	4355	4355
query52	80	79	69	69
query53	252	258	195	195
query54	263	216	214	214
query55	73	77	66	66
query56	231	215	232	215
query57	1436	1408	1333	1333
query58	235	212	205	205
query59	1649	1714	1463	1463
query60	277	234	225	225
query61	152	144	151	144
query62	720	633	587	587
query63	229	184	195	184
query64	2493	764	583	583
query65	4852	4762	4749	4749
query66	1798	463	346	346
query67	28889	28731	28609	28609
query68	3145	1532	889	889
query69	450	309	260	260
query70	1018	959	945	945
query71	281	236	215	215
query72	2799	2612	2343	2343
query73	838	791	451	451
query74	5098	4959	4744	4744
query75	2584	2532	2161	2161
query76	2333	1198	763	763
query77	365	391	279	279
query78	12498	12361	11814	11814
query79	1405	1217	735	735
query80	1307	468	399	399
query81	513	279	235	235
query82	576	155	118	118
query83	360	274	244	244
query84	260	144	115	115
query85	899	504	409	409
query86	414	294	290	290
query87	1903	1826	1768	1768
query88	3760	2812	2777	2777
query89	432	384	333	333
query90	1987	180	175	175
query91	168	159	132	132
query92	64	63	55	55
query93	1544	1420	995	995
query94	713	352	311	311
query95	691	392	358	358
query96	1096	807	337	337
query97	2712	2700	2560	2560
query98	210	206	200	200
query99	1182	1154	1023	1023
Total cold run time: 257920 ms
Total hot run time: 171437 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.04 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 09ffce185d033d49869f6519fe3e8c23c81f061e, data reload: false

query1	0.01	0.01	0.01
query2	0.09	0.05	0.05
query3	0.25	0.14	0.13
query4	1.61	0.14	0.12
query5	0.24	0.23	0.22
query6	1.24	1.08	1.09
query7	0.04	0.01	0.01
query8	0.06	0.04	0.04
query9	0.38	0.33	0.31
query10	0.54	0.56	0.54
query11	0.20	0.13	0.13
query12	0.18	0.15	0.15
query13	0.47	0.47	0.46
query14	1.01	1.00	1.01
query15	0.60	0.59	0.58
query16	0.34	0.33	0.32
query17	1.08	1.08	1.12
query18	0.23	0.21	0.22
query19	2.05	1.94	1.85
query20	0.02	0.01	0.02
query21	15.43	0.24	0.14
query22	4.85	0.05	0.05
query23	16.13	0.31	0.13
query24	2.85	0.41	0.34
query25	0.11	0.06	0.04
query26	0.72	0.21	0.14
query27	0.04	0.04	0.04
query28	3.55	0.89	0.53
query29	12.49	4.32	3.45
query30	0.27	0.15	0.15
query31	2.77	0.59	0.31
query32	3.22	0.59	0.48
query33	3.17	3.18	3.16
query34	15.55	4.20	3.51
query35	3.50	3.48	3.48
query36	0.54	0.43	0.43
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.04	0.03	0.03
query40	0.17	0.15	0.15
query41	0.09	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 96.35 s
Total hot run time: 25.04 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (2/3) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 2.90% (2/69) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.94% (2/213) 🎉
Increment coverage report
Complete coverage report

@starocean999 starocean999 marked this pull request as ready for review June 30, 2026 07:12
@morrySnow

Copy link
Copy Markdown
Contributor

add related PR

@starocean999

Copy link
Copy Markdown
Contributor Author

/review

@starocean999

Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review summary: I reviewed the changed rule and regression coverage and did not find a new blocking issue.

Critical checkpoints:

  • Goal and proof: the PR prevents right-side otherJoinConjunct pushdown for hash-empty NULL_AWARE_LEFT_ANTI_JOIN, preserving NOT IN semantics when the subquery side contains NULL. The added qt_select65 regression covers 1 NOT IN (SELECT CAST(NULL AS INT)) and expects count 0.
  • Scope and clarity: the code change is narrow and now includes the explanatory comment requested in the existing inline thread.
  • Parallel paths: checked InApplyToJoin, FindHashConditionForJoin, PushDownFilterThroughJoin, JoinExtractOrFromCaseWhen, nested-loop translation, and null-aware nested-loop handling; I did not find another path that reintroduces the same premature right-side filtering.
  • Concurrency, lifecycle, configs, persistence, FE/BE protocol compatibility, and observability: not materially touched by this PR.
  • Test/output review: the added result is deterministic. git diff --check is clean for the PR range.

Validation limit: I did not run the targeted regression suite or FE build because this runner is missing .worktree_initialized, thirdparty/installed, and thirdparty/installed/bin/protoc.

User focus: no additional user-provided review focus was present.

Subagent conclusions: optimizer-rewrite and tests-session-config both found no candidate inline comments in their initial passes. Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same zero-inline final comment set.

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 33.33% (1/3) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29464 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e8a6b5529fb2d268f29730f9575f16d8fa7c9fd3, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17648	4005	4017	4005
q2	2079	319	189	189
q3	10323	1499	836	836
q4	4679	482	345	345
q5	7545	868	575	575
q6	182	171	143	143
q7	779	838	617	617
q8	9350	1993	1684	1684
q9	5676	4467	4436	4436
q10	6762	1823	1529	1529
q11	441	276	243	243
q12	629	428	299	299
q13	18074	3374	2761	2761
q14	272	262	238	238
q15	q16	791	784	720	720
q17	995	925	1029	925
q18	7339	5820	5601	5601
q19	1299	1202	1127	1127
q20	482	412	278	278
q21	6457	2852	2601	2601
q22	473	412	312	312
Total cold run time: 102275 ms
Total hot run time: 29464 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	5108	4779	4714	4714
q2	298	331	211	211
q3	4952	5417	4728	4728
q4	2114	2147	1389	1389
q5	4987	4705	4671	4671
q6	232	177	148	148
q7	1948	1746	1553	1553
q8	2414	2125	2080	2080
q9	7704	7304	7259	7259
q10	4743	4660	4230	4230
q11	534	388	356	356
q12	735	742	531	531
q13	2976	3331	2836	2836
q14	271	274	262	262
q15	q16	682	697	608	608
q17	1276	1256	1256	1256
q18	7209	6783	6991	6783
q19	1133	1050	1094	1050
q20	2215	2226	1954	1954
q21	5352	4572	4522	4522
q22	529	473	428	428
Total cold run time: 57412 ms
Total hot run time: 51569 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173244 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e8a6b5529fb2d268f29730f9575f16d8fa7c9fd3, data reload: false

query5	4328	649	483	483
query6	453	190	178	178
query7	4815	554	318	318
query8	343	184	176	176
query9	8798	4049	4042	4042
query10	436	311	261	261
query11	5957	2337	2156	2156
query12	159	98	94	94
query13	1245	589	387	387
query14	6311	5272	4954	4954
query14_1	4280	4264	4246	4246
query15	218	199	178	178
query16	1015	480	448	448
query17	1095	687	561	561
query18	2717	472	335	335
query19	208	200	149	149
query20	127	108	108	108
query21	223	151	113	113
query22	13579	13697	13364	13364
query23	17412	16591	16245	16245
query23_1	16316	16239	16282	16239
query24	7560	1804	1300	1300
query24_1	1327	1304	1314	1304
query25	580	465	400	400
query26	1299	345	174	174
query27	2543	582	346	346
query28	4396	2071	2062	2062
query29	1079	650	521	521
query30	311	244	201	201
query31	1122	1072	943	943
query32	124	64	62	62
query33	541	324	261	261
query34	1197	1150	647	647
query35	774	785	669	669
query36	1403	1390	1250	1250
query37	161	110	101	101
query38	1881	1729	1649	1649
query39	933	944	884	884
query39_1	884	890	880	880
query40	236	127	107	107
query41	72	69	68	68
query42	92	91	92	91
query43	331	323	288	288
query44	1429	814	788	788
query45	203	189	177	177
query46	1114	1217	728	728
query47	2431	2380	2333	2333
query48	406	426	313	313
query49	588	444	329	329
query50	997	361	275	275
query51	4420	4415	4344	4344
query52	86	81	72	72
query53	256	274	192	192
query54	284	236	215	215
query55	83	80	66	66
query56	264	240	224	224
query57	1447	1420	1342	1342
query58	259	229	229	229
query59	1588	1697	1451	1451
query60	309	265	242	242
query61	196	149	151	149
query62	697	648	595	595
query63	228	183	192	183
query64	2490	763	588	588
query65	4878	4773	4767	4767
query66	1745	470	338	338
query67	29781	29666	29559	29559
query68	3190	1628	962	962
query69	410	299	274	274
query70	1102	995	984	984
query71	289	241	218	218
query72	2866	2630	2619	2619
query73	821	808	422	422
query74	5137	4959	4775	4775
query75	2574	2552	2167	2167
query76	2325	1239	788	788
query77	355	374	291	291
query78	12393	12547	11954	11954
query79	1392	1145	785	785
query80	962	477	389	389
query81	509	287	248	248
query82	1310	161	120	120
query83	364	280	255	255
query84	275	145	114	114
query85	926	509	408	408
query86	497	295	289	289
query87	1829	1826	1751	1751
query88	3713	2824	2815	2815
query89	440	382	324	324
query90	1858	176	179	176
query91	175	173	135	135
query92	65	60	55	55
query93	1502	1557	896	896
query94	648	355	300	300
query95	687	469	362	362
query96	1113	786	361	361
query97	2737	2740	2595	2595
query98	216	206	199	199
query99	1200	1169	1037	1037
Total cold run time: 259381 ms
Total hot run time: 173244 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.3 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e8a6b5529fb2d268f29730f9575f16d8fa7c9fd3, data reload: false

query1	0.00	0.00	0.01
query2	0.09	0.05	0.06
query3	0.26	0.15	0.13
query4	1.60	0.14	0.14
query5	0.24	0.21	0.21
query6	1.23	1.05	1.08
query7	0.03	0.01	0.00
query8	0.06	0.04	0.04
query9	0.38	0.31	0.32
query10	0.55	0.59	0.54
query11	0.19	0.14	0.14
query12	0.18	0.14	0.14
query13	0.46	0.48	0.47
query14	1.01	1.01	1.01
query15	0.62	0.60	0.59
query16	0.31	0.32	0.32
query17	1.07	1.12	1.10
query18	0.23	0.20	0.21
query19	2.07	1.93	2.02
query20	0.02	0.01	0.01
query21	15.45	0.23	0.13
query22	4.82	0.06	0.05
query23	16.12	0.30	0.13
query24	3.03	0.42	0.35
query25	0.12	0.05	0.04
query26	0.72	0.20	0.17
query27	0.04	0.05	0.04
query28	3.54	0.89	0.54
query29	12.48	4.30	3.44
query30	0.26	0.15	0.17
query31	2.77	0.61	0.31
query32	3.22	0.60	0.49
query33	3.17	3.24	3.22
query34	15.69	4.18	3.55
query35	3.54	3.52	3.60
query36	0.56	0.43	0.41
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.04	0.03	0.03
query40	0.18	0.16	0.16
query41	0.09	0.03	0.04
query42	0.04	0.03	0.03
query43	0.04	0.03	0.03
Total cold run time: 96.66 s
Total hot run time: 25.3 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants